AITopics | inference benchmark

Collaborating Authors

inference benchmark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments

Fursin, Grigori

arXiv.org Artificial IntelligenceJun-24-2024

In this white paper, I present my community effort to automatically co-design cheaper, faster and more energy-efficient software and hardware for AI, ML and other popular workloads with the help of the Collective Mind framework (CM), virtualized MLOps, MLPerf benchmarks and reproducible optimization tournaments. I developed CM to modularize, automate and virtualize the tedious process of building, running, profiling and optimizing complex applications across rapidly evolving open-source and proprietary AI/ML models, datasets, software and hardware. I achieved that with the help of portable, reusable and technology-agnostic automation recipes (ResearchOps) for MLOps and DevOps (CM4MLOps) discovered in close collaboration with academia and industry when reproducing more than 150 research papers and organizing the 1st mass-scale community benchmarking of ML and AI systems using CM and MLPerf. I donated CM and CM4MLOps to MLCommons to help connect academia and industry to learn how to build and run AI and other emerging workloads in the most efficient and cost-effective way using a common and technology-agnostic automation, virtualization and reproducibility framework while unifying knowledge exchange, protecting everyone's intellectual property, enabling portable skills, and accelerating transfer of the state-of-the-art research to production. My long-term vision is to make AI accessible to everyone by making it a commodity automatically produced from the most suitable open-source and proprietary components from different vendors based on user demand, requirements and constraints such as cost, latency, throughput, accuracy, energy, size and other important characteristics.

automation recipe, cknowledge, software and hardware, (10 more...)

arXiv.org Artificial Intelligence

2406.16791

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders

Campos, Daniel, Magnani, Alessandro, Zhai, ChengXiang

arXiv.org Artificial IntelligenceJun-1-2023

In this paper, we consider the problem of improving the inference latency of language model-based dense retrieval systems by introducing structural compression and model size asymmetry between the context and query encoders. First, we investigate the impact of pre and post-training compression on the MSMARCO, Natural Questions, TriviaQA, SQUAD, and SCIFACT, finding that asymmetry in the dual encoders in dense retrieval can lead to improved inference efficiency. Knowing this, we introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods by pruning and aligning the query encoder after training. Specifically, KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation. Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.01016

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

Campos, Daniel, Zhai, ChengXiang

arXiv.org Artificial IntelligenceMar-31-2023

Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries. As these vector-based systems rely on contextual language models, their usage commonly requires GPUs, which can be expensive and difficult to manage. Given recent advances in introducing sparsity into language models for improved inference efficiency, in this paper, we study how sparse language models can be used for dense retrieval to improve inference efficiency. Using the popular retrieval library Tevatron and the MSMARCO, NQ, and TriviaQA datasets, we find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds

artificial intelligence, language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2304.00114

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Nvidia wins new AI inference benchmark for data center and edge SoC workloads ZDNet

#artificialintelligenceNov-7-2019, 18:41:34 GMT

Nvidia is touting another win on the latest set of MLPerf benchmarks released Wednesday. The GPU maker said it posted the fastest results on new MLPerf inference benchmarks, which measured the performance of AI inference workloads in data centers and at the edge. MLPerf's five inference benchmarks, applied across four inferencing scenarios, covered AI applications such as image classification, object detection and translation. Nvidia topped all five benchmarks for both data center-focused scenarios (server and offline), with its Turing GPUs. Meanwhile, the Xavier SoC turned in the highest performance among commercially available edge and mobile SoCs that submitted for MLPerf under edge-focused scenarios of single-stream and multi-stream.

benchmark, inference benchmark, win new ai inference benchmark, (9 more...)

#artificialintelligence

Country: North America > United States > California > Alameda County > Berkeley (0.06)

Industry:

Information Technology > Hardware (0.98)
Information Technology > Services (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.81)
Information Technology > Artificial Intelligence > Vision (0.58)

Add feedback